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Abstract 

While the Cox Proportional Hazard model is a fundamental tool in sur¬ 
vival analysis, its semi-parametric nature precludes the estimation of up¬ 
per survival quantiles in the presence of heavy censoring. In contrast, 
fully parametric models do not suffer from this issue - at the expense 
of additional modeling assumptions. In this article, we extend a popular 
family of parametric models which make the Accelerated Failure Time 
(AFT) assumption to account for heteroscedasticity in the log-survival 
times. This adds substantial modeling flexibility, and we show how to 
easily and rapidly compute maximum likelihood estimators for the pro¬ 
posed model in the presence of censoring. In an application to the analysis 
of a colon cancer study, we found that heteroscedastic modeling greatly 
diminished the significance of outliers, while even slightly decreasing the 
average size of prediction intervals. 

Keywords: Accelerated Failure Time assumption, Heteroscedastic model¬ 
ing, Right-censored lifetimes, Expectation-Conditional Maximization 


Introduction 


When modeling the impact on failure times T of potential predictors X = 
(W, • ■ ■, Xd)^ statisticians have a number of tools at their disp osal. Perha ps the 
most famous is the Cox Proportional Hazards (CPH) model (jCoxl . 119721) . The 
CPH model is highly flexible, straightforward to fit, and accommodates censored 
survivals. However, the semi-parametric CPH approach typically used in prac¬ 
tice cannot estimate the conditional survival function S{t \ x) = P{T > t\X = 
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x) fo r t greater than the largest observed survival time (IMoeschberger and Kleinl . 
1985ll. This becomes an important concern when the censoring rate is high (e.g., 
Sv and Tavlor . 200Clll . 

In contrast, fully parametric models do not suffer from this issue. A pop¬ 
ular family of parametric models make the Accelerated Failure Time (AFT) 
assumption, namely that the condional distribution of the survival times is 

log(T)=f,(X)+e, 


where e ^ fn(t) is a random va riable which does not depend on X (IWeil . 11992 : 
Kalbfleisch and Prenticel 2002 1. AFT models have an appealing interpretation: 
the relation between the conditional survival function S{t\x) of T and the 
“baseline” survival function So{t) of e is simply 

S{t\x) = So{X{x) ■ t), where X(x) = 

However, as with any parametric model, incorrect specification of n{x) and fo(t) 
can adversely affect inferential results. 

The purpose of this article is to relax the AFT model’s homoscedasticity 
assumption on the log-survivals. Much work has been done on this in the 


mode ling” 


Hougaard. 

1991: 

Keiding et ah. 

1997; 

Pan, 

2001; 


Zhang and Peni 


2 OO 9 II . We adopt instead a conditionally heteroscedastic approach by considering 
a model of the form 

\og{T) = ^i{X) + a{X) ■ e. (1) 

Estimation for locati on-scale type regression model s such as (U) ha s been ex¬ 
tensively studied; see Muller and Stadtmiillei J 1987 h Cai and Wang J 2008ll for 
non-p a rametric and Hsieh (1996); Zeng and Lin ()2007 1: Zhang and Davidian 
( 2008h : Sii et al. ( 2012 1 for semi-paramet ric approaches. Indeed, model dH) can 
be vi ewed as a quantile regression model (IKoenker and Bassettl . Il978l : iKoenkeil 
2 OO 5 II . For specific ^{x) and cr(x), qu antile regress i on estimates h ave been de¬ 
veloped to accou nt for right-censoring (Powell . 19861 Portnoy , 2003ll and applied 
to survival data ( Peng and Huangl . I2OO8II . One drawback of many quantile re¬ 
gression models which do not specify the distribution of e is the difficulty of con¬ 
structing confid ence intervals fo r the model paramete rs an d quantile estimates: 


see for instance iKoenkeii ( 1994f) : lAngelis et al.l (1993); and iKocherginskv et al 


( 200,^ for a review of several existing methods. 


Fu ll y parametric a pproaches to (P) have been studied by e.g.. lBoscardin and Gelman 
1 1996ll : SmvthI 1 2002ll . Following these authors, we consider the model formula¬ 
tion 

^{x) = x'P, (7'^{x) = exp{x'j), e~A/’(0,1). (2) 

While the adequacy of a specific failure time model undoubtedly varies from one 
dataset to another, here we shall advocate that the Heteroscedastic Accelerated 
Failure Time (HAFT) model described by (P is an attractive addition to the 
survival modeling toolkit for a number of reasons: 
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1. Interpretability. As with the homoscedastic AFT model, the conditional 
survival function of the heteroscedastic HAFT model can be obtained by 
a simple transformation of the baseline survival function of e: 


S{t\x) = So(^X{x) ■ where 


l{x) = 

a{x) = l/cr(a;). 


For the HAFT model dD) we are proposing, it is easy to evaluate S'(t | a;) 
for any combination of t and x using the quantile function of a standard 
Normal distribution. 

2. Tractability. The HAFT model enjoys a simple al gorithm for computing 
maxi mum likelihood estimators of /3 and 7 (e.g., Smvth . 19891 Verbvla . 


1993!) - full details and an implementation using standard statistical soft¬ 
ware are provided in Section [TJ Moreover, confidence intervals for the 
model parameters and quantile estimates can readily be constructed from 
the Hessian matrix of the log-likelihood. 

3. Censoring. The HAFT model ([2]) admits a simple Expectat ion-Conditional 
Maximization (ECM) algorithm ( Meng and Rubin , Il993h to estimate (3 
and 7 in the presence of right-censored failure times (described in Sec¬ 
tion II.2|) . 

4. Flexibility. As a generalization of the homoscedasctic case, the HAFT 
model adds considerable flexibility to the modeling of failure times. We 
ill ustrate this with da ta from the well-known colon cancer clinical trial 
of iLaurie et al.l (|l989l i. The HAFT model was found to have far fewer 


outliers than its homoscedastic counterpart, while actually decreasing the 
average size of prediction intervals. 


Elaborating on these points, the remainder of this article is organized as 
follows. Parametric estimation for the HAFT model in the presence of censoring 
is detailed in Section [TJ A comparison of its performance on the colon cancer 
data relative to the homoscedastic AFT model is presented in Section |5| We 
conclude with a discussion of further work in Section |3| 


1 Parameter Estimation for the HAFT Model 

Let Ri = log(Ti) and Xi = (Xu,... ,XiD} denote the log-survival time and 
predictors for subject i. For ease of exposition, we decompose the covariates of 
the HAFT model into their mean and variance effects: 

R, I X, exp(Z'7)), (3) 

where Wi = {Wii,...,Wip) = f(Xi) and Zi = The 

model parameters are (3 = (/3i,...,/3p) and 7 = ( 71 ,..., 7 ^), and the log- 
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likelihood function is 


1 ” 

mi\R.x) = --Y, 


{R, - WlOr 
exp(^'7) 


+ Z’j 


where R = {Ri, ..., i?„). 

1.1 Maximum Likelihood Estimation Without Censoring 

We first present a method of calculating the MLE of (/3, 7 ) for complete (uncen¬ 
sored) cases. For fixed 7 , the conditional log-likelihood for the mean parameters 
is 


£{f3\^,R,X) = --J 2 


{R, - Wl(3) 


21 


where af = exp(Z' 7 ). 


This is the log-likelihood function of a Normal linear model with known variances 
al With = [Wi I ■ • • I W„] , it is maximized at 


(3 = {W'flWy^W'flR, where = diag(c 


0 - 


(4) 


For fixed /3, the conditional log-likelihood of the variance parameters is 


iij\f3,R,X) = --J 2 


U, 


exp(Z' 7 ) 




where Ui = {Ri-W'f3f. {5) 


This has long been recognized as the log-likelihood of a Generalized Linear 
Model (GLM) for a Gamma distribution with logarithmic link function (e.g., 
Nelder and Preeibon . 1987 : Smyth, 1989ll . The latter provides a Fisher scoring 
algor ithm which iteratively updates /3 and 7 and conv erges to the M LF (ISmvth , 
1989fl . While further accelerations are possible (e.g., Smyth, 2002 ). the maxi¬ 
mization of GLM likelihoods at present can be easily accomplished with tools 
from standard regression software. For example, with Unxi = ■ ■ ■ y Un) and 

the matrix Znxq = \_Zi | | , the maximum 7 of ([5]) can be computed in 

R with the command 


glm(C/~ .Z - 1, family = Gamma("log")). (6) 

We found that alternating between updates (|3]) and m converged very quickly 
to the MLF of (/3,7) in the data analysis of Section!^ 


1.2 An ECM Algorithm for Censored Observations 

A common feature of lifetime survival data is the censoring of observations. 
Instead of observing the actual (log) failure time Ri, we observe Yi, where 
Yi = min(i?i, Ci) and Ci is the censoring time. We also observe Si = i{Ri < Ci), 
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an indicator variable for whether the survival time of subject i is censored or 
not {5i = 1 means uncensored). Assuming that R and C are conditionally in¬ 
dependent given the covariates, h ere we describe a n Exp ectation-Conditional 
Maximization (ECM) algorithm ( Meng and Rubin . 1993ll whi c h ext ends the 
well-known Expectation Maximization algorithm (e.g., lAitkinl . Il98lli for the 
censored homoscedastic linear model to our heteroscedastic setting. 

Let Y = (Fi,..., Fi), 5 = (di,..., F), and 7 ^*^) denote the parameter 
values at iteration t. 


• E-step: We have 

Qt{(3,j)=E[£{f3n\R,X) I y,d,X,/3(‘),7W] 

2 


= E 




2=1 


exp(Z' 7 ) 


1 




S. - 2fl,ir//3 + (Wlm 


2=1 
2 




1 


2=1 


exp(Z' 7 ) 




2=1 





= exp(Z' 7 / 2 ) 


Si = l 

s^ = o, 

(t) 


5,: = 




S, = 1 


(af ))2g(F) + 2pf'>R, S, = 0, 


F = 


Y — 11 ^ 
ri 

Jt) 


it) 


/(a) = 


‘Pja) 

$(-«)’ 


g{a) = 1 + 


a(p{a) 

^(-a) 


and (p{a) and <i>(a) are the PDF and CDF of a standard Normal dis¬ 
tribution. Indeed, for Z ^ A/’(0,1) we have /(a) = E[Z \ Z > a] and 
g{a) = E[Z‘^\Z> a]. 


• M-step for /3: The conditional maximum = argmax^ Qt{P, 7 ^*^) is 

given by the weighted linear regression estimate: 

pit+i) = (w'np)W)-^w'np^R, where = diag^(crj*^)^,..., 


M-step for 7 : Similarly, 7 ^+ 1 ) maximizes the objective function 


Qt(/3(*+'\7) = -2 


U, 


exp(IF/7) 


Wh 


where Ui = F —2.RiW//3(*''"^i + (W//3i*+^i)^. Once again this corresponds 
to the likelihood of the Gamma GLM with log link function, which can 
be maximized using standard regression software. 
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2 Application to the Colon Cancer Study 


The study of Ihaurie et al. ( 19891) and iMoertel et all ( 19901) is one of the first 
successful clinical trials of adjuvant chemotherapy for colon cancer. The dataset 
contained N = 888 patients with colon carcinoma randomly assigned to the 
control group (no treatment) or one of two chemotherapy treatment groups: 
levamisol combined with fluorouracil or levamisole alone. In addition to the the 
treatment group, 10 covariates (e.g., gender, age, severity of cancer) for each 
subject were also recorded. Over half the survival times in the sample were 
right-censored (A^cens = 458). 

As a basis of comparison for the HAFT model, we consider both the ho- 
moscedastic AFT with log-Normal survivals and a Cox proportional hazards 
(CPH) model. Stepwise regression based on the AIC was employed to select the 
covariates in the AFT and CPH models amongst all main effects and second or¬ 
der interactions. The HAFT model was given the same location covariates as its 
homoscedastic counterpart, and for simplicity we set the shape covariates to all 
the main effects. Parameter estimates for the fitted models are in Appendix lAl 
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Figure 1: Left: Estimated survival curves for three randomly selected patients. 
Right: Lowest estimated survivals for the CPH model (mean indicated by 

dashed line). 
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2.1 Model Comparisons and Goodness-of-Fit 

Figure [T] displays the estimated survival curves for all three models for several 
randomly selected subjects. Due to the high proportion of censored observa¬ 
tions, the semi-parametric CPH model does not produce estimates for the upper 
survival quantiles. Indeed the CPH model truncates more than half of the pre¬ 
dicted survival curves above 40% survival. 

The AIC statistics for the parametric models are 7680.2 for AFT and 7671.0 
for HAFT (the AIC for the CPH model is calculated from a partial likelihood 
and thus cannot be compared directly to the other two). To further compare 
AFT to its heteroscedastic extension, we consider the following goodness-of-fit 
tests for the model residuals. 

For a given fitted model with parameters 0, we would like to compare the 
survival time of each patient to its predictive distribution p{Ti \Xi,9). In 
the absence of censoring, the HAFT model residuals are 


R^ - Wl$ 

exp(Z' 7 / 2 )- 


With censoring, however, the observed data is not Ri but {Yi,Si), with 
Yi = min(i?i,C'i) and 5i = i[i?i < G]. A common approach to defining 
model residuals in the presence of censoring is to impute the missing survivals 
times ( Hillisl . Il995[) . That is, each censored observation is given a stochastic 
residual Si, computed as above but with Ri drawn from its truncated condi¬ 
tional distribution, 

Ri R > Yi,Xi,6). 


The resulting Hillis residuals are approximately standard Normal under a cor¬ 
rectly specified model. However, in the presence of heavy censoring as in our 
study, the Hillis residuals which are simulated from the posited model can eas¬ 
ily overwhelm the uncensored data, and thus significantly decrease the power of 
goodness-of-fit tests. 

Instead, we opted to fit a second parametric model to the conditional distri¬ 
bution of censoring times. While this requires additional assumptions, the large 
number of censored observations provided sufficient information to select AFT 
and HAFT candidate models for p{C | X), exactly as for the survival distribu¬ 
tion but with indictor 1 — 5. 

Let fR\x{r\x), FR\x{r\x) and /c|x(c|a;), Fc\x{c\x) denote the con¬ 
dition PDF and CDF of survival and censoring distributions respectively. Then 
the conditional PDF of the observed time Y is 


/y|5.x(y|<J=l,^ = a:)oc/fl|x(y|a;)- (l - Fb | x(y I ®)), (7) 

/v|5,x( 2/|<5 = 0,X = a;) oc/c|x( 2 /|a^) • (l - F’fl | x(y I a^)), (8) 


for uncensored and censored observations respectively. 
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While the conditional distributions for the AFT and HAFT models are Nor¬ 
mal, distributions © and dS]) are not. To construct residuals for this setting, 
we mapped each observation Yi to its predicted Normal quantile: 

e, :=$-i(p(F<r,|5„X„0)), (9) 

where P{Y < y\5,X,6) is the CDF associated with the PDFs in ([7]) and (|8]). 
The inner term in @ thus corresponds to the probability integral transform of 
Yi , such that the it are approximately standard Normal when both the survival 
and censoring models are correctly chosen. 

2.2 Results 

Figure [5] displays the observation times along with their predicted means and 
95% prediction intervals. The data are grouped by censoring status, and the 
subjects are sorted on the x-axis in increasing order of the AFT model’s pre¬ 
dictions. (This is why the HAFT model predictions appear to be more erratic.) 


AFT Model (Uncensored) 



Subject 

AFT Model (Censored) 


o 



HAFT Model (Uncensored) 



Subject 

HAFT Model (Censored) 



Figure 2: Observation times along with their predicted means and 95% 

prediction intervals. 




















AFT Model 



Residuals 


Figure 3: Residuals for AFT and HAFT models as computed by ([9]). The solid 
line corresponds to the A/’(0,1) residual distribution expected under the 
hypothesis that the model is correct. 
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Figure 4: Widths of 95% prediction intervals for the AFT and HAFT models. 


For the uncensored observations, the HAFT model has noticeably fewer out¬ 
liers (indicated dashed lines). Both models have roughly the same outliers in the 
censored observations, but these are accompanied by wider prediction intervals 
with HAFT. This can also be seen from the model residuals ii, calculated as 
in which are shown in Figure O These suggest that accounting for het- 
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eroscedasticity in the AFT considerably improves the model’s fit, eliminating 
many of the extreme values. 

While the HAFT model fits the data better by increasing its prediction 
intervals, it can also decrease prediction intervals for the non-extreme cases. 
The width of both models’ 95% prediction intervals are plotted in Figured! It 
is noteworthy that while the range of the HAFT prediction intervals is larger 
than AFT’s, its average prediction interval is actually smaller. 


3 Discussion 


The heteroscedastic AFT model we have proposed is a natural extension to 
its homoscedastic originator, and benefits from tractable computations in the 
presence of right-censoring. In an analysis of the colon cancer study which fea¬ 
tures heavy censoring, the HAFT model was found to substantially diminish the 
significance of model outliers, without increasing the average size of prediction 
intervals. 

The results of this study are promising for the HAFT model, prompting 
several possible extensions to more complex models or with fewer assumptions. 
For instance, heavy-taile d residu als could be incorporated via the t-distribution. 


see 


Arellano-Valle et al.l (|2012f) . Alternately one might choose not to spec- 


19921 : Zhang and Davidian . 2008t Zhou et ah . 2012 1 can be adapted to the het- 


ify the residual distribution, in which case a number of semi-parametric ho- 
moscedastic AFT models fe .g.. iBuc klev an d JamesL 19791 Robins and TsiatiS . 


eroscedastic setting. Similarly, it is possible to embed the HAFT model within 
more complex models to account for individual-level random effects or com¬ 
peting risks. It is hoped that the computational simplicity of the basic HAFT 
model can be leveraged to design effective Monte Carlo inference strategies in 
these more sophisticated settings. 
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A Parameters of Fitted Survival Models 


Table 1: Coefficients and Variances 



1 HAFT(sd) 

AFT(sd) 

CPH(sd) 


1 Location Parameters (/3) 

Hazard Parameters 

(Intercept) 

10.84(1.4e+00) 

10.48(7.9e-01) 


rxLev 

0.19(2.6e-02) 

0.07(2.9e-02) 

-0.24(1.9e-01) 

rxLev+5FU 

0.27(2.9e-02) 

0.04(2.6e-02) 

-0.19(1.8e-01) 

sex 

0.69(2.le-01) 

0.94(2.5e-01) 

-1.04(5.2e-01) 

age 

-0.03(1.8e-04) 

-0.02(1.5e-04) 

0.03(1.4e-02) 

obstruct 

-0.25(2.2e-02) 

-0.44(1.5e-02) 

0.09(1.9e-01) 

perfor 

-0.30(3.7e-02) 

-0.21(1.le-01) 

0.33(3.1e-01) 

adhere 

-1.21(5.2e-01) 

-1.30(6.7e-01) 

0.51(2.0e-01) 

nodes 

-0.17(1.4e-03) 

-0.15(1.9e-03) 

0.14(4.3e-02) 

dif[moder] 

-1.02(7.4e-01) 

-0.74(6.6e-01) 

1.24(9.4e-01) 

dif[poor] 

-2.81(9.5e-01) 

-2.55(8.6e-01) 

3.55(1.Oe-l-00) 

ext [muscle] 

-0.64(8.3e-01) 

-0.24(2.0e-01) 

0.39(6.1e-01) 

ext [serosa] 

-1.01(8.1e-01) 

-0.79(1.8e-01) 

0.91(5.9e-01) 

ext[cstruct] 

-1.53(8.4e-01) 

-1.25(2.3e-01) 

1.28(6.2e-01) 

surg 

-0.20(1.2e-02) 

-0.24(1.le-02) 

0.21(1.le-01) 

node4 

-0.33(3.5e-02) 

-0.44(3.6e-02) 

0.48(1.9e-01) 

I(nodes^) 

0.004(1.5e-06) 

0.004(3.0e-06) 

-0.004(1.7e-03) 

age:dif[moder] 

0.02(1.8e-04) 

0.02(1.7e-04) 

-0.02(1.5e-02) 

age:dif[poor] 

0.04(2.4e-04) 

0.04(2.2e-04) 

-0.06(1.6e-02) 

obstruct iperfor 

0.73(1.5e-01) 

1.19(3.5e-01) 

-1.19(6.1e-01) 

sexiage 

-0.01(5.5e-05) 

-0.02(6.4e-05) 

0.02(8.2e-03) 

rxLevisex 

-0.11(4.4e-02) 

-0.13(5.3e-02) 

0.10(2.3e-01) 

rxLeviobstruct 



0.61(2.8e-01) 

rxLev+5FU:sex 

0.44(6. le-02) 

0.39(5.6e-02) 

-0.44(2.5e-01) 

rxLev+5FU lobstruct 



0.04(3.1e-01) 

age:adhere 

0.02(7.7e-05) 

0.02(1.2e-04) 


adhere:dif[moder] 

-0.05(2.3e-01) 

-0.12(2.8e-01) 


adhere:dif[poor] 

0.45(3.Oe-01) 

0.57(3.4e-01) 


adhereinodes 



-0.06(3.5e-02) 


1 Shape Parameters (7) 


(Intercept) 

0.01(1.le-l-00) 



rxLev 

0.29(3.2e-02) 



rxLev+5FU 

0.66(3.8e-02) 



sex 

0.02(2.4e-02) 



age 

O.Olhoe-05) 



obstruct 

0.46(3.5e-02) 



perfor 

-1.16(2.0e-01) 



adhere 

-0.17(4.2e-02) 



nodes 

-0.06(1.le-03) 



dif[moder] 

-0.25(7.3e-02) 



dif[poor] 

0.09(9.5e-02) 



ext [muscle] 

-0.35(1.le-l-00) 



ext [serosa] 

-0.00006(9.9e-01) 



ext[cstruct] 

0.07(1. le-l-00) 



surg 

0.07(2.7e-02) 



node4 

0.20(6.7e-02) 
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